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CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation of International Application No. PCT/EP02/05216 
filed 13 May 2002, which claims priority to Austrian Application No. A 777/2001 
filed 16 May 2001, the entire disclosures of which are incorporated herein by 
5 reference. 

BACKGROUND OF THE INVENTION 

The invention relates to a method for analysing DNA of sweet potato. After 
Columbus introduced the sweet potato to Spain it spread to Africa, India, Asia and 

10 Oceania and became an important crop in those parts of the world. It is possible that 
the spread of the sweet potato outside America was restricted to a limited number of 
genotypes. Contrary to this supposition a wide variety of phenotypes (genotypes) can 
be found all over the world, which could be the consequence of the high level of 
heterozygoticy found in sweet potato. The sweet potato is an out-crossing hexaploid 

15 and the variation due to sexual reproduction and somatic mutation can be kept 
through vegetative propagation. 

Several germplasm collections exist throughout the world; the CIP (Lima) has 
assembled more than 4000 accessions of sweet potato. The maintenance of the large 
20 number of varieties is a huge effort, which makes important to quantify the level of 
diversity of the sweet potato accessions to enable the reduction of the number of the 
stored samples thus facilitating germplasm conservation. 

Several marker systems were developed during the last decades for genotyping that 
25 could be appUed to the sweet potato such as RAPD (Jarret et al., 1992; Gichuki et al., 
SSR (Tautz 1988) and AFLP (Zabeau and Vos 1993, Gichuki et al.). The Amphfied 
Fragment Length Polymorphism (AFLP) and Simple Sequence Repeats (SSR) or 
microsatellites, have recently become popular in fingerprinting and phylogenetic 
studies. It has also been reported that AFLP assays have better reproducibility across 
30 laboratories than RAPDs (Jones et al., 1997), however AFLP sites were shown to be 
clustered within the genome thus making the construction of linkage maps difficult. 

Waugh and his co-workers have developed a new method, called Sequence-Specific 
Amplified Polymorphism (S-SAP) (Waugh et al., 1997). This method is similar to 
35 AFLP but the S-SAP system produces amplified firagments containing long terminal 
repeat (LTR) sequence of retrotransposon at one end and a flanking adapter sequence 
ligated to host restriction site at the other displaying individual retrotransposon 
insertions as bands on a sequencing acrylamide gel (Ellis et al., 1998; Waugh et al., 
1997). 
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Waugh et al. using the original AFLP protocol digest the barley genomic DNA with 
two restriction endonucleases, a rare (PstI) and frequent (Msel) cutter enzyme and 
adapt with restriction enzyme digestion site specific adapters. The procedure consist 
5 of two consecutive PCRs (pol3aiierase change reactions). In the first one the digested 
template DNA was pre-amplified to select and bulk restriction fragments of the 
correct size and configuration using primer homologous (P and M) to the adapter 
sequences. In the second selective PGR reaction y-[^^P]ATP labelled Bare-l like LTR 
oligonucleotide and P() or M() (Pst or Mse specific primers with 1-3 selective 
10 nucleotides) selective adapter primers were added. P( ) and M( ) primers had the same 
sequence as the P and M primers in the first reaction but included one to three 
additional selective nucleotides at the 3 'end. The touchdown PGR protocol of Vos et 
al. (1995) was followed exactly. 

15 A considerable advantage of retrotransposon based polymorphic marker system is 

based on the fact that the Class I retrotransposons transpose via an RNA intermediate, 
which they convert to DNA by reverse transcription before reinsertion whereas the 
parental transposon remains fixed in the genome (see review Boeke 1989; Kumar 
1996). This means that the inserted transposon does not change its position during the 

20 evolution of the genome but every insertion elevates the polymorphism and the size of 
the genome. However solo LTR sequences, found in different genomes indicating that 
xmequal crossing over and/or intrachromosomal recombination events could delete 
inserted retrotransposon sequences (Shirasu et al., 2000). 

25 Retrotransposons are present in the genomes of all plants, ranging from single cell 
algae to angiosperms and gymnosperms. They are usually present in high copy 
number (from hundreds to millions) and high level of heterogeneity (amino acid 
similarities between individual fragments could vary from 5-75%) was observed 
among them (Flavell et al. 1992a Mol Gen). Compared with the Drosophila copia, the 

30 fimgal Tyl or even animal retrotransposons, in plants they show a considerable 
degree of sequence heterogeneity and insertional polymorphism, both within and 
between species (Flavell 1992; Boeke and Corces 1989). The most studied group of 
LTR retrotransposons is the Tyl -copia group, named after the best-studied elements 
in Saccharomyces cerevisiae and Drosophila melanogaster (Boeke and Gorces 1989, 

35 Grandbastien 1989, Schmidt 1996). The LTR sequences are positioned as direct 

repeats on both ends of the retrotransposons. Different retrotransposon families have 
different (non-cross-hybridising) LTR sequences. The 5* and 3'LTR sequences are 
identical at the time of the insertion but they can be differing through mutations 
during the time. 
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Phylogenetic analyses of the retrotransposon sequences show, with some significant 
exceptions, that the degree of sequence divergence in Tyl-copia retrotransposon 
populations between any pair of species is generally proportional to the evolutionary 
5 distance between those species (Flavell, 1992b). Several authors have also 

hypothesised that transposition could increase the genetic variability necessary for 
organisms to adapt to different environmental conditions and that they may be a major 
factor in the evolution of higher plants (McClintock, 1984; Schwarz-Sommer and 
Saedler, 1988; Wendel and Wessler 2000). The chromosomal distribution of the Tyl- 
10 copia group of retrotransposons in plants has been studied by in situ hybridisation on 
metaphase chromosomes and has revealed that these elements are dispersed 
throughout euchromatin and heterochromatin regions of all chromosomes in plants 
(Pearce 1996, Schmidt 1996, Heslop-Harrison 1997). 

15 Retrotransposon insertion is not a random event, but is controlled by the element itself 
and by signals depending on the host organism and on extemal factors. Stresses and 
environmental challenges are known to stimulate the expression or the transposition 
of mobile elements (Mhiri et al., 1997; Grandbastien et al., 1997). 

20 Despite of their abxmdant distribution the most of the retrotransposon sequences are 
inactive because of the mutations caused defective structures. The only active 
retrotransposons known to be mobile are the Ttol, Tntl and Tnp2 of tobacco and 
Tosl7 of rice (Grandbastien 1989; Vaucheret 1992; Hirochika 1993; Hirochika 1996; 
Vemhettes 1997; Okamoto 2000), Bare-1 element of barley and PDRl of pea (Pearce 

25 et al., 1997; Ellis et al., 1998). 

The ubiquitous distribution, high copy number and widespread chromosomal 
dispersion of the retrotransposons in plants provide excellent potential for developing 
a multiplex, DNA-based marker system. 

30 

Several retrotransposon-based marker systems have been reported recently. 

Purugganan et al. (1995) restriction site polymorphism analysed on a limited region of 
the Magellan retrotransposon and was able to discriminate even closely related Zea 
35 mays subspecies. Waugh et al, in 1997 published the S-SAP method on barley and 
found that the level of polymorphism is about 25% higher than that revealed by 
AFLP. Ellis et al. (1998) amplified sequences between the polypiuine track of the 
PDRl retrotransposon and the 3* TaqI (fi-equent cutting enzyme) specific adapter 
sequence, while Pearce et al. (2000) used the same S-SAP technique with two other 
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pea retrotransposon LTR sequences (Tpsl2 and Tpsl9) but generating amplified 
fi-agments between the 5XTR and a flanking adapter (TaqI) sequences. Both primer 
contained selective nucleotides. Both experiments resulted in a detailed picture of the 
intra and interspecies relationship within the Pisum genus. Gong-Xiu Yu and RP Wisa 
5 combined the AFLP, RAPD and S-SAP markers to make a saturated map of diploid 
Avena based on a recombinant inbred population. Compared with the results of 
Waugh on barley they also found, that the S-SAP generated markers were more 
evenly distributed across the Avena genome. 

10 Although Waugh et al. have postulated that their approach may be used as a general 
approach to obtain linkage information on a range of other conserved sequences in the 
barley genome and that said approach could also be applied to any other species, its 
tumed out that this S-SAP approach may not be generally applied to phylogenetic 
analysis of any plant species not even to plant species being similar to barley. One 

15 reason for that is that retrotransposon approach according to Waugh et al. is highly 
dependent on the specific sequence of retrotransposon chosen and also on the general 
variety of "transposon" jumping. 

SUMMARY OF THE INVENTION 
20 It is an object of the present invention to provide a method for analysing DNA of 
sweet potatoes allowing phylogenetic and linkage analysis of sweet potato and to 
provide means for performing this method. 

Therefore, the present invention provides a method for analysing DNA of a sweet 
25 potato characterised in by the following steps: 
providing DNA of a sweet potato, 
physically breaking said DNA into DNA pieces, 

introducing known sequences at at least one of the two ends of each DNA piece, 
providing at least two primers, a first primer according to the formula 
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(Nx)nAGTCCTAACANiN2N3 (I) 



wherein Nx is selected fi-om A, C, G and T; n is 0 to 20; Ni is G, T, A or not present; 
N2 is A, C, G or not present; N3 is A, G, C or not present; or a complementary 
35 sequence thereto; and a second primer being able to anneal to the introduced 
sequence, 

-amplifying DNA of the DNA pieces with said primers and 
-analysing said amplifying DNA. 
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Surprisingly it turned out with the present invention that a method similar as the one 
applied by Waugh et al. may be used for analysing sweet potato DNA and making a 
phylogenetic and linkage analysis of different sweet potato individuals from 
genetically different sweet potato races. It tumed out that a specific retrotransposon of 
5 sweet potato, the Strl87 retrotransposon, is extremely suitable for analysing and 

distinguishing even otherwise very closely related sweet potato individuals and allows 
a clear and distinct phylogenetic grouping of these individuals. In general with the 
present method a primer designed to the 5*LTR of the Strl87 retrotransposon is used 
together with a primer which is located 5* to said 5*LTR sequence on an introduced 
10 piece of DNA. 

The Strl87 LTR primer proved to be the most polymorphic of all sequences tested 
and the sweet potato individuals analysed were found to have an extreme high 
variability between the numbers of the inserts. Indeed, the gradual increase of the 
15 integration sites indicates that the StrlS? retrotransposon was/is in the closest past 
active. 

The method according to the present invention further tumed out to be much more 
reliable and specific than other methods tested for this approach in other plant 
20 genomes such as RAPD or AFLP. 

It is therefore possible to distinguish closely related potato races genetically and 
allocate them to specific origins. 

25 There is a nxunber of methods known for physically breaking DNA into pieces. Most 
prominent are statistical or defined restriction endonuclease digestion or mechanical 
breaking e.g. by sonication. According to the present invention it is preferred to break 
the DNA by restriction endonuclease digestion, preferably by digestion with at least a 
6 bp cutting enzyme, especially EcoRI. 

30 

The first primer to be used within the method according to the present invention 
efficiently amplifies the 5'LTR of retrotransposon Strl87. Therefore, the primer 
preferably comprises in its (Nx)4-region further residues being complementary to said 
region. (Nx)4 is therefore preferably e.g. TAAGACTAAG (SEQ ID NO:2) or 
35 AGACTAAG or even longer sequences from the 5*LTR. 

Since 5TTR sequences are identical or at least highly similar to 3*LTR sequences, 
amplification of DNA pieces comprising 3*LTR sequence might have a negative 
effect on the method according to the present invention. Therefore, the primers are 
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preferably designed in a way that excludes amplification of sequences being 5* of 
3*LTR sequences e.g. by providing G, T or A as Ni (because the first base 5* of the 
3*LTR is a G). Such primers may also be used in a second round of performing the 
present invention, e.g. if the multiplicity of the differences is too high without such a 
5 limitation. 

Preferred first primers are therefore selected from AGACTAAGAGTCCTAACA 
(SEQ ID NO:3), AGACTAAGAGTCCTAACAG (SEQ ID NO:4), 
AGACTAAGAGTCCTAACAT (SEQ ID NO:5), AGACTAAGAGTCCTAACAA 
(SEQ ID NO:6), AGACTAAGAGTCCTAACAGC (SEQ ID NO:7), 
AGACTAAGAGTCCTAACAGA (SEQ ID NO:8), 
AGACTAAGAGTCCTAACAGG (SEQ. ID NO:9), 
AGACTAAGAGTCCTAACATA (SEQ. ID NO: 10), 
AGACTAAGAGTCCTAACATG (SEQ ID NO: 11), 
AGACTAAGAGTCCTAACATC (SEQ ID NO: 12), 
AGACTAAGAGTCCTAACAAA (SEQ ID NO: 13), 
AGACTAAGAGTCCTAACAAG (SEQ ID NO: 14), 

AGACTAAGAGTCCTAACAAC (SEQ ID NO: 15), or fragments thereof, said 
fragments optionally comprising at least 10 bp of the 3* part of these sequences. 

The introduction of known sequences at at least one of the two ends of each DNA 
piece (preferably of course at the 5* end) preferably comprises cutting the DNA with a 
restriction enzyme, optionally making blxmt ends (depending also on the restriction 
enzyme), and linking an adapter to the end. This adapter comprises e.g. a known 
sequence whereto said second primer is designed to anneal. Instead of making blxmt 
ends, of course the adapter can be constructed by a linker designed to the restriction 
site. 

The analysis of the amplified DNA is preferably carried out by separating the 
30 amplified nucleic acid molecules by size e.g. with gel-electrophoresis. Such systems 
may be provided in a highly automated form and may be performed by roboters. 

The power of the method according to the present invention lies in the fact that it may 
be used for defining the phylogenetic relationship of any two sweet potato individuals 
35 having different genotypes. For defining this relationship a method according to the 
present invention is performed on each of the sweet potato having different 
genotypes, thereby getting a defined result with respect to their specific amplification 
(S-SAP analysis). Then these results of the sweet potato having different genotypes 
may be compared whit each other. Since with the method according to the present 
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invention each sweet potato gives a characteristic "fingerprint" in this analysis, these 
fingerprints may be compared to each other and their phylogenetic relationship may 
be defined by the degree of similarity these fingerprints have. An impressive 
demonstration of the power of this method is given in the example section. 

5 

Preferably the comparing step comprises analysing a size separation of the amplified 
nucleic acids of each sweet potato species, potato race, potato subtypes, etc. It is 
therefore possible to differentiate between geographical areas and secondary 
distribution areas of specific sweet potato specimen. 

10 

There is a number of methods for comparing these "fingerprints" preferably these 
comparisons are performed with computer aids. Several computer programmes are 
available for such analysis e.g. genotyper. Treecon, TFPGA, Arlequin, Genographer, 
RFLPSCAN etc. 

15 

According to another aspect of the present invention also a kit for performing the 
methods according to the present invention is provided which comprises at least two 
primers as defined herein (a first primer and a second primer) and a nucleic acid 
polymerase for amplifying nucleic acid defined by these two 
20 primers. 

Preferably, a kit according to the present invention further contains a restriction 
enzyme specific adapter with primer, a ligase enzyme for the adapter ligation, buffers, 
nucleotides, positive or negative controls and mixtures thereof. 

25 

According to another aspect the present invention also relates to a nucleic acid 
molecule comprising a sequence of the formula II, 

(Nx)oAGTCCTAACA(Nx)m (H), 

30 

wherein Nx is selected firom A, C, G and T; m and o are independently fi-om each 
other 0 to 1000. 

Especially, the present invention provides a nucleic acid molecule comprising SEQ ED 
35 NO: 1 , sequences differing in not more than 1 b/bp per 20 b/bp firom this sequence, 
sequences hybridizing under stringent conditions (e.g.6 x SSC, 65^C) to such 
sequences or complementary sequences to such sequences. 
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Preferably, the length of II is between 10 and 500, especially between 12 and 286. 
Preferably it contains the LTR region and optionally the polypurine tag according to 
FIG, 1 and FIG. 2. 

5 The present invention will be described in more detail by way of the following 
examples and the drawing figures, yet it is not restricted to these particular 
embodiments. 

BRIEF DESCRIPTION OF THE DRAWINGS 

10 FIG. 1 shows Ipomoea batatas retrotransposon partial sequence (3* RNaseH (SEQ ID 
NO:30), polypurine track and partial LTR region (SEQ ID NO: 1)). 

FIG. 2 shows Ipomoea batatas retrotransposon sequence and the used LTR primers 
(Str6 RNaseH = SEQ ID NO:28, (-1) 3*LTR primer = SEQ ID NO:31; Str85 RNaseH 
15 = SEQ ID NO:29, (-Ibp 3' LTR primer = SEQ ID NO:32; Strl87 RNaseH = SEQ ID 
NO:30, (+lbp) 3* LTR primer = SEQ ID NO:33; Strl87/0 primer = SEQ ID NO:3; 
Strl87/G primer = SEQ ID NO:4; Strl87/GC primer = SEQ ID NO:3; EOl primer = 
SEQ ID NO: 18; E44 primer = SEQ ID NO:35). 

20 FIG. 3 shows a comparison of the banding pattem after S-SAP analysis. 

FIG. 4 shows a comparison of the S-SAP and AFLP analysis of nine sweet potato 
genotypes. 

25 FIG. 5 shows a S-SAP analysis of nine different sweet potato resources. 

FIG. 6 shows a regional map of Eastem Africa showing the original collection sites of 
sweet potato varieties. 

30 FIG. 7 shows a distribution of the plants in four groups with different insertion 

number; clear colxunns represent the range of the insertion number in a group while 
dark columns show the numbers of the varieties in a given group. 

FIG. 8 shows a list of adapters and primers used for AFLP pre-amplification and 
35 selective PGR (EcoAl adapter = SEQ ID NO: 1 9; EcoA2 adapter = SEQ ID NO:20; 
EOl primer = SEQ ID NO:l 8; E33 primer = SEQ ID NO:21 ; E36 primer = SEQ ID 
NO:22; MseAl adapter = SEQ ID NO:23; MseA2 adapter = SEQ ID NO:24; MOl 
primer = SEQ ID NO:25; M38 primer = SEQ ID NO:26; M40 primer = SEQ ED 
NO:27). 
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FIG. 9 shows a phylogenetic analysis of 173 Eastern African varieties by clustering. 

FIG. 10 shows a dendrogram based Nei*s (1972) genetic distance method = UPGMA 
5 modified from neighbor procedure of PHYLIP Version 3.5; 

FIG. 1 1 shows a supposed distribution of the sweet potato in East Africa. 

DETAILED DESCRIPTION 

10 

Examples 

Strl 87 retrotransposon sequence was found and cloned with a known method (Pearce 
et al. 1999). After having sequenced the Strl 87 clones (see SEQ. ID. NO. 1) 
15 oligonucleotide primer sequences have been designed capable in different methods to 
fingerprint and distinguish sweet potato genomes. During the procedure as outlined in 
the present examples two types of primers are used: 

The LTR primer or first primers are designed after the retrotransposon sequence 
20 optionally with features preventing amplification of 3'LTR. The other primers or 
second primers may be any sequence which makes an adapter to the restriction site 
used, including a primer site. This adapter primer should match the PGR parameters 
of the first primer. Both primers may be extended on the 3' end with preferably 1-3 
optional nucleotides. In the method according to the present examples the primers are 
25 used in PGR reactions with sweet potato DNA templates. The nucleic acid 

polymerase used in the reactions is a commercially available thermostable DNA 
polymerase from the thermophilic bacterium Thermus aquaticus (Taq polymerase) or 
other thermostable polymerases. 

30 The nucleotide triphosphate substrates are employed as described in PGR Protocols, A 
Guide to Methods and Apphcations, M.A. Innis et al. 1989 and US Patents 4,683,195 
and 4,683,204. The substrates can be modified for a variety of experimental purposes 
in ways known to those skilled in the art. 

35 In the first step (1.) of the present process sweet potato genomic DNA as template 
DNA is fragmented with sequence specific restriction endonucleases. It is possible to 
use one, two or even three different restriction endonucleases. 
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Fragmented genomic DNA is ligated with restriction size compatible adapter 
sequences with designed adapter specific primer binding sites. 

One or more PGR reactions are performed with adapter specific and LTR specific 
5 primers. Both primers can be extended with extra nucleotides to reduce the number of 
the amplified fi*agments. In the last PGR reaction the LTR primers are labelled so that 
the LTR- adapter primer amplified PGR product is distinguishable fi-om the adapter- 
adapter primers. 

10 Such labelling may be performed by any method known in the art. Preferably, 

labelling by isotopes or non-isotopic methods such as biotinglation, fluorescent dyes 
or other methods. 

PGR-Products may be separated by agarose or acryl amide gel-electrophoresis, 
15 manual or automatic, and visualised depending on the labelling of the (LTR) primer. 

Similar procedures have been presented from Waugh et al. (1997), Ellis et al. (1998 ) 
and Pearce et al. (2000). The electrophoresis of the amplified genomic fi-agments with 
the same flanking LTR sequence separates the different length of firagments according 

20 to the mobility. Smaller firagments have higher mobility as the longer ones. Different 
sweet potato samples turned out to have different electrophoresis pattem in 
consequence with the place and number of the retrotransposon insertions. Automated 
gel-electrophoresis systems (sequencer equipment, Genotyper programme) can 
compare more than hvuidred firagments of different length, but it is of course also 

25 possible to evaluate the result with manual methods. Gonversion of these 

electrophoresis patterns to a presence/absence (yes/no) per variety matrix is possible 
with GENOTYPER or GENOGRAPHER programmes mentioned above or can of 
course be done manually. A clustering analysis of this matrix is possible using such 
methods as Unweighted Pair Group Method using Arithmetic Averages (UPGMA) 

30 (Sneath and Sokal, 1973) or neighbour joining (Saitou and Nei, 1987) with 

programmes such as TREEGON. Other ordination analysis are also possible such as 
multidimension scaling (MDS) or principal component analysis (PGO) with 
programmes such as SPSS, SYSTAT, STATISTIKA or SAS. Further it is possible for 
the analysis of the geographical origin of the tested sweet potato sample to compare 

35 the number of the retrotransposon insertions in the related genotypes. Plants growing 
on the same area are liable to the same stress effect which could induce among others 
retrotransposon activation fiirther new insertions. 
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Sweet potato resources and DNA purification 

Lyophilised leave samples were used for all the analysis. Sixty seven landraces were 
5 obtained from the Kenya Agricultural Research Institute gene banks at the University 
of Nairobi, field station, Kabete 59 landraces were obtained from the Ugandan 
National Agricultural Research Institute, Namulongeand. Forty four landraces were 
obtained from the Tanzania Agricultural Research Institute, Tengeru. Individual 
pathogen tested clones from Columbia, Peru, Mexico, Brasil and Papua New Guinea 
10 were obtained from the International Potato Centre germplasm collection at Kabete, 
Kenya. From this total sample 9 genotypes from different countries were selected for 
primer comparisons and for comparison of the S-SAP system with other molecular 
markers (AFLPs and RAPDs). Details of these genotypes are given in Table 1. 

15 Table 1 





Variety 


Other names and codes 


Country of origin 


Type of genotype 


1 


Mafuta 




Kenya 


Landrace 


2 


Simama 


KemblO, CIP 440169 


Kenya 


Landrace 


3 


Kyebandula 


EAI 56702 


Uganda 


Landrace 


4 


Wagabolige 


CIP 440167 


Uganda 


Landrace 


5 


Camote 
Amarillo 


CIP 400014 


Colombia 


Landrace 


6 


Santo Amaro 


CIP 400011 


Brasil 


Landrace 


7 


No. 221 


CIP 400009 


Mexico 


Landrace 


8 


Japonese 
Tresmesino 


CIP 420009 


Pern 


Landrace 


9 


Naveto 


CIP 440131 


Papua New Guinea 


Landrace 



Table 1 

Names, covintry of origin and type of genotype of the nine selected varieties used for 
20 testing primer combinations and for comparing S-SAP molecular markers with RAPD 
and AFLP markers 

All the BLARI (Kenya Agricultural Research Institute) and CIP (Intemational Potato 
Centre) germplasm was sampled from field collections. For most of the Ugandan and 
25 Tanzania germplasm, vine cuttings were sampled from the field collection and planted 
in pots in a green house. Foiu* weeks later, fresh leaves were sampled for freeze 
drying. In all cases, 5-7 very young leaves were cut from vigorously growing plants, 
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immediately dipped in liquid Nitrogen. Freeze dried leaves were stored at 4C until 
DNA was isolated. About 20 mg of freeze dried plant material in liquid Nitrogen was 
groimd in a bead mill for 5 minutes. Total DNA was isolated and purified with a 
*Dneasy plant minikit' (QIAGEN) following the original protocol. After extraction, 4 
5 nl of 10 mg/ml RNase A was added and the sample incubated at 37^C for one hour. 
DNA was quantified with a TKO 100' Mini-fluorimeter (Hoefer scientific 
instruments) and quality assessed on a 0.8% agarose gel stained with 0.5 ^ig/|il 
Ethidium Bromide in a IX TBE buffer. 

10 PGR amplification of Tyl-copia retrotransposon LTRs 

Msel or EcoRI restriction enzyme digested genomic DNA was amplified with 
degenerate RNaseH gene specific and enzyme cutting site specific flanked PCR 
primers as written by Pearce et al. 1999. Separation of the biotinilated first PCR 
15 products was made on Streptavidin coated magnetic Dynabeads particles. 

5* Biotinilated RNaseH primer: 5'MGNACNAARCAYATHGA (SEQ ID. NO: 16) 
Nested RNaseH primer: 5*GCNGAYATNYTNACNAA (SEQ ID. NO: 17 

N: A, G, T or C 
20 M: A or C 
Y:CorT 
H: A, C or T 

The degenerated RNaseH primers are kindly gift firom the laboratory of AJ Flavell 
25 (Department of Biochemistry, Univ. Dundee) and were designed by sequence 

homologies of known retrotransposon origin RNaseH genes. The amplified fi-agments 
were cloned into Topo 4 TA cloning vector (TOPO TA Cloning Kit, Clontech K4575- 
01) and sequenced. 

30 Identification of LTR sequences of Tyl-copia type transposons of sweet potato: 

Approximately one hundred clones with variable degree of homology to Tyl-copia 
RNaseH gene were identified but only three (Str6, Str85, Strl87) showed the 
characteristic RNaseH gene, stop codon, polypurine track and putative 3*LTR 
35 sequence elements (FIG. 2). The Str6 and Strl87 sequences proved to be homologue 
with the Tyl-copia retrotransposons. The Str85 clone was not recognised by Blast 
search as copia type retrotransposon sequence despite the copia homologue primer 
site in the RNaseH similar sequence and the polypurine track region. The putative 
inverted repeat region (IR) of the LTR region is different in the three sweet potato 
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sequences, only the Strl87 clone contains the characteristic TGTT sequences. 
Although with lower frequencies other JR sequences occur (Picea abies Tpa8 
TAGTT) it is believed aheady as a mutation. Furthermore, in the putative LTR region 
of the Str6 clone, after the TATT inverted repeat sequence a 34 bp long direct repeat 
5 was recognised which provided another proof for the unusually high mutation rate in 
the sweet potato retrotransposon population. The starting point of the 3'LTR 
sequences for the rest of the sequenced clones could not be determined, since they did 
not contain a recognisable polypurine-track after the RNaseH gene stop codon. In 
many cases the sequence was interrupted with the Msel restriction cutting site, 
10 however using the rare cutting EcoRI enzyme to fragment the genomic DNA longer 
clones have been got, but the identification of the LTR sequence was fiirther not 
possible. 

The LTR sequence detected in the Str6 and Strl87 clones proved to be fimctional in 
15 the S-SAP analysis while Str85 did not produce an amplified polymorphic banding 
pattem. FIG. 2 shows the list of the LTR and Eco adapter primers tested in S-SAP 
reactions. 

Discussion 

20 

PGR amplification of the Tyl-copia retrotransposon LTRs 

Sweet potato DNA sequences were isolated with degenerate oligonucleotide primers 
corresponding to conserved domains of the Tyl-copia retrotransposon RNaseH gene 
25 fragment and flanked adapter primers. The amplified clones were cloned as written in 
Methods. 2-300 random clones have been sequenced but only three clones with 
recognisable LTR sequences were found. In these three clones the stop codon of the 
RNaseH genes, the characteristic polypurine tracks and the putative 3*LTR regions 
could be distinguished. 

30 

Every retrotransposon class has a different LTR region, which is homologue in the 
class, but not between classes. The fact that only two working LTR region were foimd 
between more hxmdred sequenced clones one can suppose, that in the sweet potato the 
mutation rate of the retrotransposons are very high, and also that only few classes of 
35 retrotransposon class exist. Otherwise it has to be considered, that the sweet potato are 
propagated mainly vegetatively, which means, that a retrotransposon insertion in the 
vegetative cells has longer "life time" fiirthermore bigger chance for mutations. It is 
known that different biotic and abiotic stresses can induce the mobility of the 
retrotransposon (Mhiri et al., 1997; Grandbastien et al., 1997). However plant 
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genomes have evolved mechanisms to repress uncontrolled retrotransposon 
expansion, such as DNA methylation (Liu and Wendel 2000) deleterious mutations 
(Nuzhdin 1999; Heslop-Harrison et al. 1997), unequal crossing over and/or 
intrachromosomal recombination between LTRs (Shirasu et al. 2000). 

5 

The high variability of the Strl87 retrotransposon insertion between different sweet 
potato clones alludes to the mobility of these retrotransposon. 

After preliminary experiments the Strl87 retrotransposon LTR sequences were used 
10 to design S-SAP primers. Increasing the number of the selective nucleotide on the 
adapter or LTR primers the nvimber of the detected insertions were reduced as 
expected. However the reduction was much more effective (4-5 times per nucleotide) 
if more selective nucleotides on the LTR primer were increased the best scoring was 
achieved with only one nucleotide extension on the LTR primer but it has to be 
15 considered that only a 6 bp cutting enzyme was used to fragment the genomic DNA. 
Waugh et al. fragmented the barley genomic DNA with a 6 and a 4 bp cutting enzyme 
accordingly to the AFLP procedure, generating more and shorter genomic fragments, 
but they had to reduce the number of the amplified fragments to a scorable amoimt 
with increasing the number of the selective nucleotides. Furthermore, they did not use 
20 the selective nucleotide on the LTR primer accordingly they amplified not only the 
plant specific genomic DNA but possibly the internal retrotransposon sequences too. 



S-SAP method 



25 The procedure from Waugh et al. (1997) was adopted to sweet potato with some 
modification. 



Genomic DNA was digested only with one rear-cutting enzyme (EcoRI) and ligated 
with specific adapter in one reaction. Two PGR reactions were performed. 

The first pre-selective PGR amplification was made with Dynazyme Taq polymerase 
in 50 \il reactions during 30 cycles on 52®G annealing temperature. LTR specific 
primers without any extension and the EOl -adapter primer (Table 2) were used. 
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Table 2 



LTR primers 




Str 187/0 


5'AGACTAAGAGTCCTAACA 3* (SEQ ID NO:3) 


Strl87/G 


5'AGACTAAGAGTCCTAACACG 3' (SEQ ID N0:4) 


Adaptor primer 




EOl 


5'GACTGCGTACCAATTCA 3' (SEQ ID NO: 18) 



Strl87 primers used in S-SAP analysis 
5 First reaction: Str 1 87/0-EOl 
Second reaction: Strl87/G-E01 



Second, selective PGR amplification was made with Quiagen Hot Taq DNA 
polymerase in 25 ^il reactions. Touch down from 70°C (-0.7^C/cycle) to 55°C than 
10 another 20 cycles at 55°C annealing temperature. With selective nucleotide extended 
FAM labelled transposon primer (Strl87G) was combined with the EOl adapter 
primer (Table 2). Reactions were loaded on acrylamide gel and separated on ABI 373 
automated sequencer. 

15 Adaptation of the S-SAP method to sweet potato 

hi the original S-SAP protocol (Waugh et al.) the genomic DNA are cut with two 
enzymes as it is usual in AFLPs, a rare cutter and a frequent cutter (Vos et al.). 
However adapting the S-SAP technique for sweet potato digesting the genomic DNA 

20 with only one rare cutting enzyme instead of two improved the number and length of 
the polymorphic bands. Further improvement was achieved by pre-amplifying the 
adapted DNA with the adapter and non-labelled LTR primers. The second specific 
amplification was carried out with the adapter primer and selective nucleotide 
extended LTR specific primer. These modifications resulted in a high number of 

25 amplified products both polymorphic and monomorphic. hi preliminary experiments 
the three sweet potato LTR primers were tested in S-SAP analysis and the Str 187 
showed the highest level of polymorphism. The Str6 primer produced a moderate 
number of polymorphic patterns, but no amplification products were obtained with 
Str85. 

30 

Subsequent experiments were carried out with the Str 187 LTR primers. 



16 



Nine sweet potato varieties were selected from Africa, South and Central America 
and Papua New Guinea and tested with the different LTR/adapter primer 
combinations. Table 3 shows the results of these comparisons. 

5 Table 3 





E44/187GC 


E01/187GC 


E44/187G 


E01/187G 


Freq. 


N 


% 


N 


% 


N 


% 


N 


% 


1 


25 


64 


32 


63 


79 


46 


86 


33 


2 


3 


8 


4 


8 


40 


23 


51 


20 


3 


3 


8 


4 


8 


24 


14 


41 


16 


4 


4 


10 


3 


6 


11 


6 


28 


11 


5 


3 


8 


1 


2 


10 


6 


19 


7 


6 


0 


0 


2 


4 


4 


2 


7 


3 


7 


0 


0 


4 


8 


2 


1 


10 


4 


8 


0 


0 


0 


0 


2 


1 


12 


5 


9 


1 


2 


1 


2 


1 


1 


6 


2 


Ins. 


39 




51 




173 




260 





Table 3: Comparison of the different primer combinations in S-SAP analysis of nine 
10 sweet potato varieties. Frequencies (Freq.) means that the tested Strl87 

retrotransposon has insertion into one, two or all of the nine genome. Coliman N 
represents the total number of the insertions, which are present in the nine genome 
one, two or nine times. 

Dates are shown also in percentage. In the row insertion (Ins.) are shown the total 
15 number of the insertions amplified with the given primer pair. 

The E44 adapter primer in combination with the Strl87GC or G primers gave 36 and 
173 polymorphic bands respectively, representing individual retrotransposon 
insertions. Reducing the number of the selective nucleotide on the LTR primer 
20 significantly elevate the number of the amplified insertions. 

The same relation was observed in case of the E01/187GC and E01/187G primers. 
Reducing the selective nucleotide with one, the number of the amplified insertions 
elevated from 5 1 to 26 1 . 

25 

The number of the selective nucleotide on the adapter specific primer has only a 
minor effect on the insertion amplification. 
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In Table 3 there are presented the frequencies of the insertions ampUfied from only 
one, two or even all of the nine plan genomes. It can be seen that the polymorphism is 
very high; the percentage of the monomorph bands comparing with the total nimiber 
5 of the insertions is only 1-2%. However the number of the unique insertions - 

amplified from only one plant genome - is very high 33-69% of the total insertions. 

A phylogenetic analysis of the nine sweet potato varieties with the E01-Strl87/G and 
E44-Strl87/G primer combinations are shown in Fig 5. Both primer combinations 
10 distinguish the South American varieties from the Afiican ones. The clones from 
Mexico and Papua New Guinea were associated to the African types. With the two 
other primer combinations where the LTR primer is extended with two nucleotides, 
the South American and African varieties were not differentiated from each other 
(data not shown). 

15 

AFLP analysis 

The AFLP methodology was essentially as described by Vos et al. (1995) but adapted 
for sweet potato with fluorescent labelling and sequencer ruiming of the gel. Two 

20 restriction enzymes, Msel and EcoRI were used to fragment the genomic DNA. The 
restriction-digested DNA was subsequently ligated to two different synthesised 
double-stranded oligonucleotides that consists of a short DNA strand and the 
restriction enzyme recognition site (Table 4). Pre-amplification was done using 
primers EOl and MOl. An annealing temperature of 60°C was used for 45 cycles. 

25 Selective amplification of the PGR products of the pre-amplification was done with 
primers identical to the pre-amplification primers with an additional 2 selective 
nucleotides at their 3* ends (Table 4). 

Table 4: List of Adapters and primers used for AFLP pre-amplification and selective 
30 PGR 



EcoRI- Adapters 


Nucleotide sequence 


EcoAl 


5-CTC GTA GAC TGG GTA CC-3 (SEQ ID NO: 19) 


EcoA2 


5-AAT TGG TAG GCA GTC-3 (SEQ ID NO:20) 


Pre-amplification primer 




EOl 


5-GAC TGG GTA CCA ATT GA-3 (SEQ ID NO: 18) 


Selective PCR Primers 




E33 


5-GAC TGC GTA CCA ATT CAA G-3 (SEQ ID NO:21) 


E36 


5-GAC TGC GTA CCA ATT CAC T-3 (SEQ ID NO:22) 
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Msel - Adapters 




MseAl 


5-GAC GAT GAG TCC TGA G-3 (SEQ ID NO:23) 


MseAZ 


5-TAC TCA GGA CTC AT-3 (SEQ ID NO:24) 


Pre-amplification primer 




MOl 


5-GAT GAG TCC TGA GTA AA-3 (SEQ ID NO:25) 


Selective PCR primers 




M38 


5-GAT GAG TCC TGA GTA AAC T-3 (SEQ ID NO:26) 


M40 


5-GAT GAG TCC TGA GTA AAG C-3 (SEQ ID NO:27) 



EcoRI selective primers were ABI-FAM fluorescent labelled to prevent occurrence of 
'doublets' on the gels due to unequal mobility of the two strands of the amplified 
fragments (Vos et aL, 1995). The samples were loaded on a 6% polyacrylamide 
5 denaturing gel and run with an ABI Prism 373 sequencer for 10 hours. The gel was 
scanned and samples extracted using GENESCAN 3.1 programme. The PCR products 
of selective amplification were visualised. An intemal size standard was incorporated 
into the sample. Visualised peaks indicating position of amplified fragments were 
analysed with GENOTYPER 2.5 programme to develop a 0/1 (absence/presence) 
10 fragment by sample matrix. Peak filter conditions were set to include only peaks with 
scaled height of at least 30. Selection of categories was done as described above for 
the S-SAP procedure. Informative products typically fall within 50-450 bp, (Sharbel 
1999). Only categories between 50-400 bp were utiHsed for data analysis. 

15 RAPD analysis 

RAPD amplifications were carried out as described by Williams et al. (1991) with a 
few modifications as described in Gichuki et al., (2001). 

20 Gel and data analysis 

Data were analysed with Genotyper 2.5 programme. Peaks, corresponding to an 
amplified retrotransposon insertion were designated into categories. The tolerance of 
a category was chosen to be ± 0.25-0.5 bp, which means if two amplified fragments 
25 show bigger difference than 0.5 or 1 bp, then they were selected as two different 
categories. Data representing insertions in bp were converted with the Genotyper 
programme to a presence/absence (1/0) of insertion per variety matrix for use in other 
phylogenetic programmes such as Treecon. 
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Comparison of the RAPD, AFLP and S-SAP 



The Tyl-copia transposon based S-SAP analysis is a dominant marker system 
yielding a multiband pattem. Each individual band of this pattern represents a unique 
5 retrotransposon integration site (FIG. 3). The objective was to test whether a 

genotyping system based on the consecutive integration of retrotransposon elements 
results in a similar genetic relatedness of accessions compared to those generated 
using for RAPDs and AFLPs, which are based on the alterations of the DNA 
sequence. Therefore nine sweet potato genotypes representing different geographic 
10 regions already identified by RAPD analysis were analysed by AFLP and S-SAP 
techniques respectively (Table 1). 

The banding patterns were compared with UPGMA dendograms using Nei, 1979 
genetic distance (FIG. 4). 

15 



Table 5 





Niimber of 

genotypes 

analysed 


Total 

number of 
assays 


Total number 
of 

amplification 
products 


Number of 

polymorphic 

products 


Mean number 
of products 
per assay 


% 

Polymorphic 
loci 


ILAPD 


9 


12 


74* 


65 


6.2 


87.8 


AFLP 


9 


2 


228 


179 


114 


78.5 


S-SAP 


9 


1 


260 


254 


260 


97.7 



20 Summary of each type of analysis performed 

Total number of amplification products obtained per analysis type, number that were 
polymorphic, mean number of products per assay (primer or primer product) and 
overall percentage of polymorphic loci 

*Only distinct bands which demonstrated polymorphism were scored for RAPDs 

25 

Table 5 shows the details of the three analysis methods. The percentage of the 
polymorphic loci was the highest in S-SAP analysis (97.7%) where 260 insertions 
were amplified with only one primer pair. In the barley genome, a 25-30% increase in 
the rate of polymorphism has been observed with retrotransposon-based S-SAP, as 
30 compared to standard AFLP (Kumar 1996; Waugh et al, 1997; Gong-Xin Yu and R.P. 
Wise 2000). hi the present case this ration is smaller, 19% comparing with the AFLP 
method. Although RAPDs showed a high level of polymorphism only distinct 
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banding patterns which showed polymorphism in an earlier study of 74 genotypes 
were included (Gichuki et al., 2001 in paper). Therefore polymorphism of the RAPD 
analysis is over-estimated, therefore it is not comparative with the AFLP and S-SAP 
data (Table 5). The high polymorphism observed in the three methods may be due to 
5 the vegetative propagation of the sweet potato. 

All the three different genotyping method clearly identified two South American 
clones Zapallo (Peru)/ and Camote Amarillo (Colombia) as a separate group (see 
FIG. 4). The four Afiican clones were also identified as another group. The Mexican 
10 clone, No.221 and the Papua New Guinea, Naveto, were in all three cases related to 
the Afiican clones. The Brazilian clone, Santo Amaro, was related to the South 
American clones in both the S-SAP and the RAPD and with the Afi-ican clones in the 
AFLP analysis. 

15 The important factors in choice of a genetic marker includes, development time and 
cost, capital outlay, amount and quality of DNA required, prior knowledge of DNA 
sequence, required technical expertise, robustness, informativeness, genome coverage 
and reproducibihty (Vos et al., 1995; Milboume et al., 1997; Milboume et al., 1998; 
Powell et al., 1996). The S-SAP markers require a higher initial cost of development 

20 than both RAPDs and AFLPs due to the need to isolate the LTR repeat sequence of 
the retrotransposon. On the other hand the LTR sequence adaptation costs to specific 
genomes is comparable to that of AFLPs. The S-SAP was demonstrated to be superior 
to both RAPD and AFLP in terms of number of amplification products revealed and 
nxmiber of polymorphic loci (Table 5). To select the 12 RAPD random primers more 

25 than 100 primers were screened and only about half produced any amplification 
products. Considering that 12 RAPD assay and 2 AFLP assays were required to 
achieve approximately the same level of analysis, it is evident that on per assay basis 
the S-SAP procedure may be the fastest of the three methods for genetic analysis and 
characterisation of the sweet potato at a comparable cost. 

30 

Compared to fluorescent AFLPs it was found that the S-SAP peaks were more 
distinct. Though both the AFLP and S-SAP markers are dominant, the high multiplex 
ration of the S-SAPs indicates that they are more informative. AFLP and RAPD 
markers target random regions of the genome. However some concerns have been 
35 expressed by some writers regarding centrometric-clustering of AFLP markers 
particularly for linkage studies. Most AFLP primers seem to target the AT-rich 
centromere region of the chromosome. The Tyl-copia retrotransposon is widely 
distributed throughout the genome 
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(Pearce 1996, Schmidt 1996, Heslop-Harrison 1997). This would mean that the Ty-1 
copia LTR S-SAP markers are also widely distributed since they are anchored to the 
retrotransposon. Reproducibility of a marker system is quite important especially for 
germplasm characterisation, mapping and where results have to be exchanged 
5 between different labs and scientists. The AFLPs have been shown to be more 

reproducible than RAPDs (Jones et al, 1995). The sequence-specific nature of the S- 
S AP analysis may improve this reproducibility. Preliminary results indicated a high 
level of reproducibility using different PGR equipments (data not shown). 
Considering all these factors it is clear that the Ty-1 copia S-SAP marker system is a 
10 powerful method for genetic analysis in sweet potato. The usefulness of 

retrotransposon S-SAP markers has akeady been demonstrated in barley (Waugh et 
al., 1997 ) and in peas (Elliot et al.). 

E X a m p 1 e II 

15 

Analysis of the East- African clones 

Hundred seventy-one East- African accessions from Uganda, Tanzania and Kenya 
were analysed using the E01_187G primer combination in the S-SAP analysis. This 
20 primer combination yielded the highest nimiber of polymorphic bands. The PCR 
amplification and the analysis of the fragments by size were done as described in 
Materials and Methods. 

From different areas of East Africa a total of 61 varieties from Kenya, 44 from 
25 Tanzania and 61 from Uganda, were selected. Kenyan varieties came from the Central 
and Westem Highlands and the Nyanza region of the Victoria Lake basin. From 
Tanzania the varieties came from three areas, the East coast, the North-Central 
Highlands and the Lake zone. Ugandan varieties were grouped into those originating 
from the North-east Ugandan and the rest originating from Central and Westem 
30 Uganda. The geographical areas of origin are shown in the FIG. 6. 

In the S-SAP analysis of all the samples 242 insertions category of the Strl87 
retrotransposon were foimd. FIG. 7 present all the varieties in a dendogram based the 
UPGMA analysis. To simplify the analysis the samples in accordance with the 
35 geographical origin or as a member of a given monophyletic group established by 
Treecon UPGMA analysis were compared. The 172 varieties were first grouped by 
geographical origin summarised to the given country part then the analysis result was 
scored and established a phylogenetic tree (see Fig 10). 
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The phylogenetic tree shows separation of the East- African sources. East and North 
Tanzania are separated from the lake part of Tanzania, which is closely related to the 
Central/West Ugandan samples. These results are corresponding to the geographical 
position. Interestingly the Northeast Ugandan samples are mapped closer to the 
5 Kenyan one than the CentralAVest Ugandan varieties, but taking considering the 
geographical localisation it is also feasible. Although the Central Kenyan samples 
grouped together with the other Kenyan varieties on the phylogenetic tree it is 
separated from Westem and Nyanza part of the coxmtry. 

10 The results are correlating with the geographical localisation. 

Secondary distribution of the retrotransposon insertions 

Comparing the 172 tested varieties with each other by UPGMA cluster analysis ten 
15 subgroups have been identified. The subgroups are listed in Table 6. 

Table 6 Groups based on the phylogenetic analysis 



Gr.l 


Gr.2 


Gr 3. 


Gr.4 


Gn5 


Gr.6 


Gr.7 


Gn8 


Gr.9 


Gr.lO 



KWA104 


KWAIOO 


UBIOI 


KNB22 


KNB2 


TEB158 


KWBl 


KNA28 


KCA113 


rEB148 


KWA108 


KCA103 


UB102 


KN823 


KNB16 


rEB161 


KNBIO 


TLB 122 


KWAl 02 


TLB117 


KCA117 


KCA105 


UB103 


KN824 


KWB21 


FEB 165 


KN811 


TLB 123 


k:cai9 


UNC25 


KWAl 19 


KCAllO 


UB104 


KWB18 


KNB25 


rEB166 


KWB12 


rEB125 


UA22 




KNA121 


KNA120 


UB106 


UNCll 


KWB46 


rEB169 


KNB13 


rEB126 


UA4 




KNAIBI 


k:cai29 


UB108 




KWB3 


FEB 159 


KNB15 


rEB127 


KWB20 




K:NA134 


KCA36 


UB109 




KNB47 


rEB152 


KNB26 


TEB128 


UNC4 




KNA135 


TZA52 


UBllO 




KNB14 


rEB131 


KWB27 


TEB129 






UA32 


UA61 


UB112 




UNC16 


rZA27 


ICNB28 


TEB132 






KCA38 


k:na86 


UB113 




UNC2 


KCA7 


k:nb29 


TEB138 






KNA56 


ICWA98 


UB114 






KWB6 


KNB31 


TEB139 






KCA77 


KNB19 


rails 








KNB32 


TEB141 






KNA79 


UNC21 


18116 








KNB34 


TEB144 






KNA80 


UNC24 


18120 








KNB37 


TEB146 






KCA85 


UNC29 


18121 








KNB38 


TEB147 






KCA94 


UNC31 


rB142 








KN839 


rEBiso 






KCA99 


LJNC35 


KWB54 








KWB4 


lEBlSl 






LrNC22 




rNB59 








KN841 


TEB135 






UNC26 




rN860 








KN85 


rEB153 
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UNC27 




TLB61 








KNB51 


TEB155 






UNC28 




TNB69 








UNCI 


TEB156 






UNC30 




TLB79 








UNCIO 


TEB157 






UNC32 




UB81 








UNC13 


KWB33 






UNC33 




UB83 








UNC14 


TLB75 






UNC34 




UB84 








UNCI 5 








UNC36 




UB86 








UNCI 8 








UNC37 




UB87 








UNC19 








UNC38 




UB88 








UNC20 












UB89 








UNC3 












UB90 








UNC5 












UB92 








UNC6 












UB99 








UNC7 












UNC23 








UNC8 




















UNC9 










Gr.l 


Gr.2 


Gr 3. 


Gr.4 


Gr.5 


Gr.6 


Gr.7 


Gr.8 


Gr.9 


Gr.lO 



This type of analysis shows similar results, but divergence not only between but also 
5 in the different country parts too can be observed. The details are shown in Table 7, 

Table z: Distribution of the varieties in the clustered groups 





Groups 


Percentage 


No. of possible 
insertion sites 


Central Kenya 


1 


43% 


203 




2 


36% 


164 


Western Kenya 


7 


25% 


162 




1 


18% 


203 


Nyanza 


7 


48% 


162 




1 


20% 


203 


CentralAVestem Uganda 


3 


84% 


161 


E/NE-Uganda 


1 


30% 


203 




2 


14% 


164 




7 


39% 


162 


Tanzania-East 


8 


66% 


82 




6 


28% 


93 



24 



Tanzania-Lake 



Tanzania-North 



3 
8 
3 



60% 
30% 
100% 



161 



10 
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The Kenyan varieties are grouped mostly into the Group 1 , 2 and 7 together with the 
Northeast Ugandan ones. For example, 43% of the Central Kenyan clones are in the 
Group 1 and 33% of them in the Group 2. Similarly the Nyanza clones distributed 
mainly into the Group 7 but with smaller percent also present in the Group 1 and 2. 
Western Kenyan samples show the highest diversity, the highest representation is in 
the Group 7 with 23%, but they can be foimd also in the Group 1, 2 and 5. The 
Northeast Ugandem clones show similarity with the Kenyan one, they are mapped into 
the Group 7, 1 and 2, 39%, 30% and 14% respectively. Much more conserved the 
Central- Western Ugandan clones, eighty- four percent of them are in the Group 3 
together with the three North Tanzanian varieties and 60% of the Lake-Tanzanian 
samples. Another thirty percent of the Lake Tanzanian varieties are together with the 
66% of the East Tanzanian samples in the Group 8, The rest 28% of the East 
Tanzanian varieties were separated into the Group 6. 



Analysing the number of the insertions in the different groups an increasing number 
of possible insertion sites from the coast part of Tanzania (East) to Central Kenya has 
been found. The highest possible insertion number was found in the Group 1 (203). 

20 Around 16% of the investigated clones were found in that group, with the highest 
representation of the Central Kenyan samples (43%). In the group 2, 3, 7 and 9 the 
number of the possible insertion sites were 164, 161, 162, and aroxmd 60-90 possible 
insertion sites and the most predominant are the East Tanzanian samples in the Group 
6 and 8 (see Table 7 and FIG. 9). Table 7 shows only the most characteristic two 

25 groups (6, 7), because the others (4, 5 and 10) are too small or too diverse (see also 
Table 6). 

As already mentioned, retrotransposons transpose via an RNA intermediate, which 
means, that the parental insertion remains fixed in the genome. Therefore every 

30 further insertion must have happened later, meaning a recent change in the genome. 
Continuing this theory the spread of a retrotransposon in the geographical distribution 
can be followed. In that case one is able to follow the spread of the Strl87 
retrotransposon in space and time. It is supposed that where the number of the 
insertion of the given retrotransposon is lower there is the starting point of its spread 

35 on a given area. Following this theory and based on the results about the increasing 
number of the insertions, it is proposed that the sweet potato in East-Afiica occurred 
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first in East-Tanzania (insertions 80-90) and spread further to Lake-, North-Tanzania, 
Central/Western Uganda (ins. 161), East/Northeast Uganda and Kenya (FIG. 1 1), 
coming round the Victoria Lake. In Kenya and Northeast Uganda three distribution 
areas with different insertions rates were found. Varieties from Central, Westem and 
5 Nyanza area of Kenya are grouped into the Groups 1, 2 or 7 together with the 

Northeast-Ugandan clones, where the number of retrotransposon insertions is 203, 
164 or 162 respectively (see Table 7.). These results could suggest that in Kenya one 
part of the varieties were exposed to different biotic and abiotic effects, which could 
induce the retrotransposon expression resulting in new insertions. 

10 

Considering the fact, that the sweet potato was introduced into Afiica not longer than 
five hundred years ago and during this time the retrotransposon insertion could 
increase 2-3 times in the Afiican resources it can be supposed that the Strl87 
retrotransposon is a still mobile retrotransposon. 

15 
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